Intrinsic Metrics: Nearest Neighbor and Edge Squared Distances

نویسندگان

  • Timothy Chu
  • Gary L. Miller
  • Don Sheehy
چکیده

Some researchers have proposed using non-Euclidean metrics for clustering data points. Generally, the metric should recognize that two points in the same cluster are close, even if their Euclidean distance is far. Multiple proposals have been suggested, including the Edge-Squared Metric (a specific example of a graph geodesic) and the Nearest Neighbor Metric. In this paper, we prove that the edge-squared and nearest-neighbor metrics are in fact equivalent. Previous best work showed that the edge-squared metric was a 3-approximation of the Nearest Neighbor metric. This paper represents one of the first proofs of equating a continuous metric with a discrete metric, using non-trivial discrete methods. Our proof uses the Kirszbraun theorem (also known as the Lipschitz Extension Theorem and Brehm’s Extension Theorem), a notable theorem in functional analysis and computational geometry. The results of our paper, combined with the results of Hwang, Damelin, and Hero, tell us that the Nearest Neighbor distance on i.i.d samples of a density is a reasonable constant approximation of a natural density-based distance function. 1 Intrinsic Distances The foundational hypothesis supporting the fields of nonlinear dimensionality reduction [23, 4, 26], geometric inference [11], surface reconstruction [14], and topological data analysis [9] is that although points can be represented by vectors of real numbers, the intrinsic metric on those points can be highly non-euclidean. Many algorithms across all these fields start by attempting to infer the intrinsic metric. The Isomap algorithm [23] is a paradigmatic example: pairwise distances are computed using shortest paths in either the k nearest neighbor graph or an ε-neighborhood graph. Then, multidimensional scaling gives the embedding of these “intrinsic” distances, which, with luck, approximate geodesic distances in the underlying sample space (i.e. the support of a distribution). Inspired by work on graph Laplacian convergence [15, 24], McQueen et al. [20] use ε-neighborhood graphs in their implementation of large-scale manifold learning algorithms. This approach, however, means that the approximating graph will be overconnected or underconnected for any measure in which the density varies, even for those adaptive samples [3, 14, 10] for which topologically precise reconstructions are computable. In this paper, we show how two seemingly very different intrinsic metrics, the edgesquared metric and the nearest neighbor density-based distance are identical. As the former is defined as the shortest path in a weighted graph, it can be computed exactly. These metrics or close variations thereof have appeared together in several previous works, with the discrete metric often used to approximate a continuous one [6, 13, 18]. It was not known, or perhaps even suspected, that these might actually be the same metric. This gives the first nontrivial example of a so-called density-based distances [21] that can be computed exactly. Among intrinsic distance approximations, density-based distances have the advantage that they extend naturally to the entire Euclidean space. The density of a sample or a distribution is used to define a new Riemannian metric on R that scales space everywhere locally so that paths through high-density regions are shorter. Thus, shortest paths will be closer to geodesics if the points are sampled from a manifold, but no such strict hypothesis is necessary. One only needs a density or an approximation or a density estimate. 1.1 Preliminaries The simplest, general form of a density-based distance built from a probability density f is df (x, y) := inf γ ∫ 1 0 f(γ(t))− 1 d‖γ′(t)‖dt, where the infimum ranges over all piecewise smooth curves γ : [0, 1] → R such that γ(0) = x and γ(1) = y. This metric is one of the simplest metrics that matches machine learning practitioner’s intuitions about distance on points in the support of a distribution: distance between two points in a dense set should generally be considered shorter than distance through two points that are not connected by a dense set, even if their Euclidean distance is the same [2]. 1 This kind of distance is also used in percolation theory and its generalization [17, 16, 18]s. In that field, another metric is also considered, the power metric. Given a point set Q ∈ R and number p > 0, dp(x, y) = min (q0,...,qk) k

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Edge Detection Based On Nearest Neighbor Linear Cellular Automata Rules and Fuzzy Rule Based System

 Edge Detection is an important task for sharpening the boundary of images to detect the region of interest. This paper applies a linear cellular automata rules and a Mamdani Fuzzy inference model for edge detection in both monochromatic and the RGB images. In the uniform cellular automata a transition matrix has been developed for edge detection. The Results have been compared to the ...

متن کامل

Edge Detection Based On Nearest Neighbor Linear Cellular Automata Rules and Fuzzy Rule Based System

 Edge Detection is an important task for sharpening the boundary of images to detect the region of interest. This paper applies a linear cellular automata rules and a Mamdani Fuzzy inference model for edge detection in both monochromatic and the RGB images. In the uniform cellular automata a transition matrix has been developed for edge detection. The Results have been compared to the ...

متن کامل

The Study of Mazandaran Province Forest and Rangeland Vegetation Changes Trend by Satellite Images

Vegetation in any landscape reflects its health condition. Monitoring land use change and land cover plays a key role in environmental planning and management. Mazandaran province has always been considered by tourists due to its high tourism potential and consequently the vegetation, especially the forest, has been damaged. In this study, vegetation Contiguity and integrity in Mazandaran using...

متن کامل

Universal Immersion Spaces for Edge-Colored Graphs and Nearest-Neighbor Metrics

There exist finite universal immersion spaces for the following: (a) Edge-colored graphs of bounded degree and boundedly many colors. (b) Nearest-neighbor metrics of bounded degree and boundedly many edge lengths.

متن کامل

Impossibility of Sketching of the 3D Transportation Metric with Quadratic Cost

Transportation cost metrics, also known as the Wasserstein distances Wp, are a natural choice for defining distances between two pointsets, or distributions, and have been applied in numerous fields. From the computational perspective, there has been an intensive research effort for understanding the Wp metrics over R, with work on the W1 metric (a.k.a earth mover distance) being most successfu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.07797  شماره 

صفحات  -

تاریخ انتشار 2017